Wide context acoustic modeling in read vs. spontaneous speech

نویسندگان

  • Michael Finke
  • Ivica Rogina
چکیده

Context-dependent acoustic models have been applied in speech recognition research for many years, and have been shown to increase the recognition accuracy signi cantly. The most common approach is to use triphones. Recently, several speech recognition groups have started investigating the use of larger phonetic context windows when building acoustic models. In this paper we discuss some of the computational problems arising from wide context modeling (polyphonic modeling) and present methods to cope with these problems. A two stage decision tree based polyphonic clustering approach is described which implements a more exible parameter tying scheme. The new clustering approach gave us signi cant improvement across all tasks WSJ, SWB, and Spontaneous Scheduling Task and across all languages involved (German, Spanish, English). We report recognition results based on the JANUS speech recognition toolkit [2, 8] on two tasks comparing acoustic context phenomena in English read versus spontaneous speech. We used our WSJ 60K recognizer and the JANUS SWB 10K polyphonic recognizer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards automatic transcription of spontaneous presentations

This paper reports various investigations on recognizing spontaneous presentation speech in connection with the “Spontaneous Speech” national project started in 1999. Presentation speech uttered by 10 male speakers of approximately 4.5 hours duration has been recognized. Experimental results show that acoustic and language modeling based on an actual spontaneous speech corpus is far more effect...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Syllable-based acoustic modeling for Japanese spontaneous speech recognition

We study on a syllable-based acoustic modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition systems. In this paper, syllable-based unit and mora-based unit are clearly distinguished in their definition, and syllables are shown to be more suitable as an acoustic model for Japanese spontaneous ...

متن کامل

Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech

In this paper, we present a comparative study between spontaneous speech and read Mandarin speech in the context of automatic speech recognition. We focus on analysis and modeling of prosodic features, based on a unique speech corpus that contains similar amounts of read and spontaneous speech data from the same group of speakers. Statistical analysis is carried out on tone contours and duratio...

متن کامل

Prosodic Patterns of Information Structure in Spoken Discourse—a Preliminary Study of Mandarin Spontaneous Lecture vs. Read Speech

The aim of the study is to explore the prosodic patterns spontaneous lecture speech vs. read speech to show where and how these monologues differ and why by analyzing perceived emphasis and its acoustic features within and between speech paragraphs. Systematic but distinct patterns are found for both speech types in emphasis distribution across speech, overall and local tempo modulations. Read ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997